162
Applications in Computer Vision
Based on our assumption, for wi we formulate the ideal bimodal distribution as
P(wi|Θi) = βk
i
2
k=1
p(wi|Θk
i ),
(6.49)
where the number of distributions is set as 2 in this paper. Θl
k = {μk
i , σk
i } denotes the
parameters of the k-th distribution, i.e., μk
i denotes the mean value and σk
i denotes the
variance, respectively.
To solve the GMM with the observed data wi, i.e., the weight ensemble in the i-th
layer. We introduce the hidden variable ξjk
i
to formulate the maximum likelihood estimation
(MLE) of GMM as
ξjk
i
=
1,
wj
i ∈pk
i
0,
else
,
(6.50)
where ξjk
i
is the hidden variable that describes the affiliation of wj
i and pk
i (simplified deno-
tation of p(wi|Θk
i )). We then define the likelihood function P(wj
i , ξjk
i |Θk
i ) as
P(wj
i , ξjk
i |Θk
i ) =
2
!
k=1
(βk
i )|pk
i |
mi
!
j=1
1
Ωf(wj
i , μk
i , σk
i )
ξjk
i
,
(6.51)
where Ω=
2π|σk
i |, |pk
i |=mi
j=1 ξjk
i , and mi =2
k=1 |pk
i |. And f(wj
i , μk
i , σk
i ) is defined as
f(wj
i , μk
i , σk
i ) = exp(−1
2σk
i
(wj
i −μk
i )2).
(6.52)
Hence, for every single weight wj
i , ξjk
i
can be computed by maximizing the likelihood as
max
ξjk
i ,∀j,k
E
log P(wj
i , ξjk
i |Θk
i )|wj
i , Θk
i
(6.53)
where E(·) represents the estimate. Therefore, the maximum likelihood estimate ˆξjk
i
is
calculated as
ˆξjk
i
=E(ξjk
i |wj
i , Θk
i )
=P(ξjk
i
= 1|wj
i , Θk
i )
=
βk
i p(wj
i |Θk
i )
2
k=1 βk
i p(wj
i |Θk
i )
.
(6.54)
After the expectation step, we perform the maximization step to compute Θk
i as
ˆμk
i =
mi
j=1 ˆξjk
i wj
i
mi
j=1 ˆξjk
i
,
(6.55)
ˆσk
i =
mi
j=1 ˆξjk
i (wj
i −ˆμk
i )2
mi
j=1 ˆξjk
i
,
(6.56)
ˆαk
i =
mi
j=1 ˆξjk
i
mi
.
(6.57)